287 research outputs found
Predicting Scheduling Failures in the Cloud
Cloud Computing has emerged as a key technology to deliver and manage
computing, platform, and software services over the Internet. Task scheduling
algorithms play an important role in the efficiency of cloud computing services
as they aim to reduce the turnaround time of tasks and improve resource
utilization. Several task scheduling algorithms have been proposed in the
literature for cloud computing systems, the majority relying on the
computational complexity of tasks and the distribution of resources. However,
several tasks scheduled following these algorithms still fail because of
unforeseen changes in the cloud environments. In this paper, using tasks
execution and resource utilization data extracted from the execution traces of
real world applications at Google, we explore the possibility of predicting the
scheduling outcome of a task using statistical models. If we can successfully
predict tasks failures, we may be able to reduce the execution time of jobs by
rescheduling failed tasks earlier (i.e., before their actual failing time). Our
results show that statistical models can predict task failures with a precision
up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of
such predictions using the tool kit GloudSim and found that they can improve
the number of finished tasks by up to 40%. We also perform a case study using
the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene
expression correlations analysis study from breast cancer research. We find
that when extending the scheduler of Hadoop with our predictive models, the
percentage of failed jobs can be reduced by up to 45%, with an overhead of less
than 5 minutes
Is It Safe to Uplift This Patch? An Empirical Study on Mozilla Firefox
In rapid release development processes, patches that fix critical issues, or
implement high-value features are often promoted directly from the development
channel to a stabilization channel, potentially skipping one or more
stabilization channels. This practice is called patch uplift. Patch uplift is
risky, because patches that are rushed through the stabilization phase can end
up introducing regressions in the code. This paper examines patch uplift
operations at Mozilla, with the aim to identify the characteristics of uplifted
patches that introduce regressions. Through statistical and manual analyses, we
quantitatively and qualitatively investigate the reasons behind patch uplift
decisions and the characteristics of uplifted patches that introduced
regressions. Additionally, we interviewed three Mozilla release managers to
understand organizational factors that affect patch uplift decisions and
outcomes. Results show that most patches are uplifted because of a wrong
functionality or a crash. Uplifted patches that lead to faults tend to have
larger patch size, and most of the faults are due to semantic or memory errors
in the patches. Also, release managers are more inclined to accept patch uplift
requests that concern certain specific components, and-or that are submitted by
certain specific developers.Comment: In proceedings of the 33rd International Conference on Software
Maintenance and Evolution (ICSME 2017
RePOR: Mimicking humans on refactoring tasks. Are we there yet?
Refactoring is a maintenance activity that aims to improve design quality
while preserving the behavior of a system. Several (semi)automated approaches
have been proposed to support developers in this maintenance activity, based on
the correction of anti-patterns, which are `poor' solutions to recurring design
problems. However, little quantitative evidence exists about the impact of
automatically refactored code on program comprehension, and in which context
automated refactoring can be as effective as manual refactoring. Leveraging
RePOR, an automated refactoring approach based on partial order reduction
techniques, we performed an empirical study to investigate whether automated
refactoring code structure affects the understandability of systems during
comprehension tasks. (1) We surveyed 80 developers, asking them to identify
from a set of 20 refactoring changes if they were generated by developers or by
a tool, and to rate the refactoring changes according to their design quality;
(2) we asked 30 developers to complete code comprehension tasks on 10 systems
that were refactored by either a freelancer or an automated refactoring tool.
To make comparison fair, for a subset of refactoring actions that introduce new
code entities, only synthetic identifiers were presented to practitioners. We
measured developers' performance using the NASA task load index for their
effort, the time that they spent performing the tasks, and their percentages of
correct answers. Our findings, despite current technology limitations, show
that it is reasonable to expect a refactoring tools to match developer code
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs
The increasing inclusion of Machine Learning (ML) models in safety critical
systems like autonomous cars have led to the development of multiple
model-based ML testing techniques. One common denominator of these testing
techniques is their assumption that training programs are adequate and
bug-free. These techniques only focus on assessing the performance of the
constructed model using manually labeled data or automatically generated data.
However, their assumptions about the training program are not always true as
training programs can contain inconsistencies and bugs. In this paper, we
examine training issues in ML programs and propose a catalog of verification
routines that can be used to detect the identified issues, automatically. We
implemented the routines in a Tensorflow-based library named TFCheck. Using
TFCheck, practitioners can detect the aforementioned issues automatically. To
assess the effectiveness of TFCheck, we conducted a case study with real-world,
mutants, and synthetic training programs. Results show that TFCheck can
successfully detect training issues in ML code implementations
DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks
The increasing inclusion of Deep Learning (DL) models in safety-critical
systems such as autonomous vehicles have led to the development of multiple
model-based DL testing techniques. One common denominator of these testing
techniques is the automated generation of test cases, e.g., new inputs
transformed from the original training data with the aim to optimize some test
adequacy criteria. So far, the effectiveness of these approaches has been
hindered by their reliance on random fuzzing or transformations that do not
always produce test cases with a good diversity. To overcome these limitations,
we propose, DeepEvolution, a novel search-based approach for testing DL models
that relies on metaheuristics to ensure a maximum diversity in generated test
cases. We assess the effectiveness of DeepEvolution in testing computer-vision
DL models and found that it significantly increases the neuronal coverage of
generated test cases. Moreover, using DeepEvolution, we could successfully find
several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz
(a coverage-guided fuzzing tool developed at Google Brain) in detecting latent
defects introduced during the quantization of the models. These results suggest
that search-based approaches can help build effective testing tools for DL
systems
Reliable Malware Analysis and Detection using Topology Data Analysis
Increasingly, malwares are becoming complex and they are spreading on
networks targeting different infrastructures and personal-end devices to
collect, modify, and destroy victim information. Malware behaviors are
polymorphic, metamorphic, persistent, able to hide to bypass detectors and
adapt to new environments, and even leverage machine learning techniques to
better damage targets. Thus, it makes them difficult to analyze and detect with
traditional endpoint detection and response, intrusion detection and prevention
systems. To defend against malwares, recent work has proposed different
techniques based on signatures and machine learning. In this paper, we propose
to use an algebraic topological approach called topological-based data analysis
(TDA) to efficiently analyze and detect complex malware patterns. Next, we
compare the different TDA techniques (i.e., persistence homology, tomato, TDA
Mapper) and existing techniques (i.e., PCA, UMAP, t-SNE) using different
classifiers including random forest, decision tree, xgboost, and lightgbm. We
also propose some recommendations to deploy the best-identified models for
malware detection at scale. Results show that TDA Mapper (combined with PCA) is
better for clustering and for identifying hidden relationships between malware
clusters compared to PCA. Persistent diagrams are better to identify
overlapping malware clusters with low execution time compared to UMAP and
t-SNE. For malware detection, malware analysts can use Random Forest and
Decision Tree with t-SNE and Persistent Diagram to achieve better performance
and robustness on noised data
Testing Feedforward Neural Networks Training Programs
Nowadays, we are witnessing an increasing effort to improve the performance
and trustworthiness of Deep Neural Networks (DNNs), with the aim to enable
their adoption in safety critical systems such as self-driving cars. Multiple
testing techniques are proposed to generate test cases that can expose
inconsistencies in the behavior of DNN models. These techniques assume
implicitly that the training program is bug-free and appropriately configured.
However, satisfying this assumption for a novel problem requires significant
engineering work to prepare the data, design the DNN, implement the training
program, and tune the hyperparameters in order to produce the model for which
current automated test data generators search for corner-case behaviors. All
these model training steps can be error-prone. Therefore, it is crucial to
detect and correct errors throughout all the engineering steps of DNN-based
software systems and not only on the resulting DNN model. In this paper, we
gather a catalog of training issues and based on their symptoms and their
effects on the behavior of the training program, we propose practical
verification routines to detect the aforementioned issues, automatically, by
continuously validating that some important properties of the learning dynamics
hold during the training. Then, we design, TheDeepChecker, an end-to-end
property-based debugging approach for DNN training programs. We assess the
effectiveness of TheDeepChecker on synthetic and real-world buggy DL programs
and compare it with Amazon SageMaker Debugger (SMD). Results show that
TheDeepChecker's on-execution validation of DNN-based program's properties
succeeds in revealing several coding bugs and system misconfigurations, early
on and at a low cost. Moreover, TheDeepChecker outperforms the SMD's offline
rules verification on training logs in terms of detection accuracy and DL bugs
coverage
An App Performance Optimization Advisor for Mobile Device App Marketplaces
On mobile phones, users and developers use apps official marketplaces serving
as repositories of apps. The Google Play Store and Apple Store are the official
marketplaces of Android and Apple products which offer more than a million
apps. Although both repositories offer description of apps, information
concerning performance is not available. Due to the constrained hardware of
mobile devices, users and developers have to meticulously manage the resources
available and they should be given access to performance information about
apps. Even if this information was available, the selection of apps would still
depend on user preferences and it would require a huge cognitive effort to make
optimal decisions. Considering this fact we propose APOA, a recommendation
system which can be implemented in any marketplace for helping users and
developers to compare apps in terms of performance.
APOA uses as input metric values of apps and a set of metrics to optimize. It
solves an optimization problem and it generates optimal sets of apps for
different user's context. We show how APOA works over an Android case study.
Out of 140 apps, we define typical usage scenarios and we collect measurements
of power, CPU, memory, and network usages to demonstrate the benefit of using
APOA.Comment: 18 pages, 8 figure
- …